Properties of Principal Component Methods for Functional and Longitudinal Data Analysis by Peter Hall,
نویسندگان
چکیده
The use of principal component methods to analyze functional data is appropriate in a wide range of different settings. In studies of “functional data analysis,” it has often been assumed that a sample of random functions is observed precisely, in the continuum and without noise. While this has been the traditional setting for functional data analysis, in the context of longitudinal data analysis a random function typically represents a patient, or subject, who is observed at only a small number of randomly distributed points, with nonnegligible measurement error. Nevertheless, essentially the same methods can be used in both these cases, as well as in the vast number of settings that lie between them. How is performance affected by the sampling plan? In this paper we answer that question. We show that if there is a sample of n functions, or subjects, then estimation of eigenvalues is a semiparametric problem, with root-n consistent estimators, even if only a few observations are made of each function, and if each observation is encumbered by noise. However, estimation of eigenfunctions becomes a nonparametric problem when observations are sparse. The optimal convergence rates in this case are those which pertain to more familiar function-estimation settings. We also describe the effects of sampling at regularly spaced points, as opposed to random points. In particular, it is shown that there are often advantages in sampling randomly. However, even in the case of noisy data there is a threshold sampling rate (depending on the number of functions treated) above which the rate of sampling (either randomly or regularly) has negligible impact on estimator performance, no matter whether eigenfunctions or eigenvectors are being estimated.
منابع مشابه
Modelling sparse generalized longitudinal observations with latent Gaussian processes
In longitudinal data analysis one frequently encounters non-Gaussian data that are repeatedly collected for a sample of individuals over time. The repeated observations could be binomial, Poisson or of another discrete type or could be continuous.The timings of the repeated measurements are often sparse and irregular. We introduce a latent Gaussian process model for such data, establishing a co...
متن کاملModeling Sparse Generalized Longitudinal Observations With Latent Gaussian Processes
In longitudinal data analysis one frequently encounters non-Gaussian data that are repeatedly collected for a sample of individuals over time. The repeated observations could be binomial, Poisson or of another discrete type or could be continuous. The timings of the repeated measurements are often sparse and irregular. We introduce a latent Gaussian process model for such data, establishing a c...
متن کاملAsymptotic Distributions of Estimators of Eigenvalues and Eigenfunctions in Functional Data
Functional data analysis is a relatively new and rapidly growing area of statistics. This is partly due to technological advancements which have made it possible to generate new types of data that are in the form of curves. Because the data are functions, they lie in function spaces, which are of infinite dimension. To analyse functional data, one way, which is widely used, is to employ princip...
متن کاملProperties of Principal Component Methods for Functional and Longitudinal Data Analysis
The use of principal component methods to analyze functional data is appropriate in a wide range of different settings. In studies of “functional data analysis,” it has often been assumed that a sample of random functions is observed precisely, in the continuum and without noise. While this has been the traditional setting for functional data analysis, in the context of longitudinal data analys...
متن کاملAssessing Extrema of Empirical Principal Component Functions by Peter Hall
The difficulties of estimating and representing the distributions of functional data mean that principal component methods play a substantially greater role in functional data analysis than in more conventional finitedimensional settings. Local maxima and minima in principal component functions are of direct importance; they indicate places in the domain of a random function where influence on ...
متن کامل